Distribution free bounds for relational classi cation

نویسندگان

  • AMIT DHURANDHAR
  • ALIN DOBRA
  • A. Dobra
چکیده

Statistical Relational Learning (SRL) is a sub-area in Machine Learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.) { as is generally assumed. For the traditional i.i.d. setting, distribution free bounds exist, such as the Hoe ding bound, which are used to provide con dence bounds on the generalization error of a classi cation algorithm given its hold-out error on a sample size of N . Bounds of this form are currently not present for the type of interactions that are considered in the data by relational classi cation algorithms. In this paper we extend the Hoe ding bounds to the relational setting. In particular, we derive distribution free bounds for certain classes of data generation models that do not produce i.i.d. data and are based on the type of interactions that are considered by relational classi cation algorithms that have been developed in SRL. We conduct empirical studies on synthetic and real data which show that these data generation models are indeed realistic and the derived bounds are tight enough for practical use.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalization bounds for incremental search classi cation algorithms

This paper presents generalization bounds for a certain class of classi cation algorithms. The bounds presented take advantage of the local nature of the search that these algorithms use in order to obtain bounds that are better than those that can be obtained using VC type bounds. The results are applied to well-known classi cation algorithms such as classi cation trees and the perceptron.

متن کامل

Moderated Class membership Interchange in Iterative Multi relational Graph Classi er

Organizing information resources into classes helps signi cantly in searching in massive volumes of on line documents available through the Web or other information sources such as electronic mail, digital libraries, corporate databases. Existing classi cation methods are often based only on own content of document, i.e. its attributes. Considering relations in the web document space brings bet...

متن کامل

Algorithms and Applications for Universal Quanti cation in Relational

Queries containing universal quanti cation are used in many applications, including business intelligence applications and in particular data mining. We present a comprehensive survey of the structure and performance of algorithms for universal quanti cation. We introduce a framework that results in a complete classi cation of input data for universal quanti cation. Then we go on to identify th...

متن کامل

Risk bounds for Statistical Learning

We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with other ways of measuring the ”size”of a clas...

متن کامل

Data Mining using Nonmonotonic Connectionist Expert Systems

An application of Nonmonotonic Connectionist Expert Systems (NCESs) in mining classi cation rules from large relational databases is presented. NCESs are hybrid learning systems that can acquire symbolic knowledge of a nonmonotonic domain, represented using nonmonotonic inheritance networks. This initial knowledge can be re ned using connectionist learning techniques and a set of classi ed exam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008